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QUANTITATIVE CONCENTRATION INEQUALITIES FOR EMPIRICAL 
MEASURES ON NON-COMPACT SPACES 

FRANgOIS BOLLEY, ARNAUD GUILLIN, AND CEDRIC VILLANI 


Abstract. We establish some quantitative concentration estimates for the empirical 
measure of many independent variables, in transportation distances. As an application, 
we provide some error bounds for particle simulations in a model mean held problem. 
The tools include coupling arguments, as well as regularity and moments estimates for 
solutions of certain diffusive partial differential equations. 
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Introduction 

Large stochastic particle systems constitute a popular way to perform numerical simu¬ 
lations in many contexts, either because they are used in some physical model (as in e.g. 
stellar or granular media) or as an approximation of a continuous model (as in e.g. vortex 
simulation for Euler equation, see EH Chapter 5] for instance). For such systems one 
may wish to establish concentration estimates showing that the behavior of the system is 
sharply stabilized as the number N of particles goes to inhnity. It is natural to search for 
these estimates in the setting of large (or moderate) deviations, since one wishes to make 
sure that the numerical method has a very small probability to give wrong results. From 
a physical perspective, concentration estimates may be useful to establish the validity of a 
continuous approximation such as a mean-held limit. 

Key words and phrases. Transport inequalities, Sanov Theorem. 
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When one is interested in the asymptotic behavior of just one, or a few observables (such 
as the mean position...), there are efficient methods, based for instance on concentration 
of measure theory. As a good example, Malrieu ma recently applied tools from the fields 
of Logarithmic Sobolev inequalities, optimal transportation and concentration of measure, 
to prove very neat bounds like 


sup P 

ll</’l|Lip<l 


N 

i=l 


(pd^t 


> e 


< 2e 


-AA£2 


( 0 . 1 ) 


Here (Xj)i<j< 7 v stand for the positions of particles (in phase space) at time t, e is a. given 
error, P stands for the probability, fit is a probability measure governing the limit behavior 
of the system, and A > 0 is a positive constant depending on the particular system he is 
considering (a simple instance of McKean-Vlasov model used in particular in the modelling 
of granular media). Moreover, 


llv^lkip := sup 

xPy 


Ifjx) - f{y)\ 
d{x,y) 


where d is the distance in phase space (say the Euclidean norm | ■ | in R'^). 

This approach can lead to nice bounds, but has the drawback to be limited to a finite 
number of observables. Of course, one may apply m to many functions (p, and obtain 
something like 


P 
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where {pk)k&n is an arbitrarily chosen dense family in the set of all 1-Lipschitz functions 
converging to 0 at infinity. If we denote by dx the Dirac mass at point x, and by 
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the empirical measure associated with the system (this is a random probability measure), 
then estimate (ini2D can be interpreted as a bound on how close is to fit- Indeed, 


k=l 


ipk d(ti - v) 


(0.3) 


dehnes a distance on probability measures, associated with a topology which is at least as 
strong as the weak convergence of measures (convergence against bounded continuous test 
functions). However, this point of view is deceiving: for practical purposes, the distance d 
can hardly be estimated, and in any case (inii does not contain more information than m-- 
it is only useful if one considers a finite number of observables. 

Sanov’s large deviation principle m Theorem 6.2.10] provides a more satisfactory tool 
to estimate the distance between the empirical measure and its limit. Roughly speaking. 













3 


it implies, for independent variables XI, an estimate of the form 

P [dist(/ff, /i) > e] ~ as N 


CX), 


where 


a{e) := inf|if(z/|yu); dist(z/,/i) > (0.4) 

and H is the relative H functional: 

(to be interpreted as +cx) if u is not absolutely continuous with respect to /x). Since H 
behaves in many ways like a square distance, one can hope that a{e) > const. Here 
“dist” may be any distance which is continuous with respect to the weak topology, a 
condition which might cause trouble on a non-compact phase space. 

Yet Sanov’s theorem is not the hnal answer either: it is actually asymptotic, and only 
implies a bound like 

limsup — logP [dist(/if^,/i) > e] < —a(£), 

which, unlike m, does not contain any explicit estimate for a given N. Fortunately, 
there are known techniques to obtain quantitative upper bounds for such theorems, see 
in particular Exercise 4.5.5]. Since these techniques are devised for compact phase 
spaces, a further truncation will be necessary to treat more general situations. 

In this paper, we shall show how to combine these ideas with recent results about 
measure concentration and transportation distances, in order to derive in a systematic 
way estimates that are explicit, deal with the empirical measure as a whole, apply to non¬ 
compact phase spaces, and can be used to study some particle systems arising in practical 
problems. Typical estimates will be of the form 


P 


sup 

dlLip<i 


TV 

Y - / 

i=l 


if d fit) > £ 


< Ce 


-XNe^ 


(0.5) 


As a price to pay, the constant C in the right-hand side will be much larger than the one 

in (jO.H) . 

Here is a possible application of (iniiD in a numerical perspective. Suppose your system 
has a limit invariant measure /Xoo = hm/xt as f ^ oo, and you wish to numerically plot its 
density foe- For that, you run your particle simulation for a long time t = T, and plot, 
say, 

1 ^ 

( 0 . 6 ) 


N 


i=l 


where Ca = oi~'^C,{x/a) is a smooth approximation of a Dirac mass as a —>• 0 (as usual, 
is a nonnegative smooth radial function on with compact support and unit integral). 
With the help of estimates such as (iniiD, it is often possible to compute bounds on, say. 
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in terms of N, e, T and a. In this way one can “guarantee” that all details of the invariant 
measure are captured by the stochastic system. While this problem is too general to be 
treated abstractly, we shall show on some concrete model examples how to derive such 
bounds for the same kind of systems that was considered by Malrieu. 

In the next section, we shall explain about our main tools and results; the rest of the 
paper will be devoted to the proofs. Some auxiliary estimates of general interest are 
postponed in Appendix. 

1. Tools and main results 

1.1. Wasserstein distances. To measure distances between probability measures, we 
shall use transportation distances, also called Wasserstein distances. They can be de¬ 
fined in an abstract Polish space X as follows: given p in [1, -|-oo), d a lower semi-continuous 
distance on X, and /i and u two Borel probability measures on X, the Wasserstein distance 
of order p between p and u is 

Wp{p,u) := ini ([[ d{x,yydTr{x,y) 

7ren(At,i/) J 

where vr runs over the set n(/i, u) of all joint probability measures on the product space 
X X X with marginals y and u] it is easy to check m Theorem 7.3] that Wp is a distance 
on the set Pp{X) of Borel probability measures y on X such that / d{xo, xY dfi{x) < -|-cx). 

For this choice of distance, in view of Sanov’s theorem, a very natural class of inequalities 
is the family of so-called transportation inequalities, or Talagrand inequalities (see UTI 
for instance): by dehnition, given p > 1 and A > 0, a probability measure y on X satishes 
Tp(A) if the inequality 

Wp{iy,y) < 

holds for any probability measure u. We shall say that y satisfies a Tp inequality if it 
satisfies Tp(A) for some A > 0. By Jensen’s inequality, these inequalities become stronger 
as p becomes larger; so the weakest of all is Ti. Some variants introduced in [S] will also 
be considered. 

Of course Tp is not a very explicit condition, and a priori it is not clear how to check that 
a given probability measure satisfies it. It has been proven mini El that Ti is equivalent 
to the existence of a square-exponential moment: in other words, a reference measure y 
satisfies Ti if and only if there is a > 0 such that 

for some (and thus any) y E X. If that condition is satisfied, then one can hnd explicitly 
some A such that Ti(A) holds true: see for instance jS]. 

This criterion makes Ti a rather convenient inequality to use. Another popular inequality 
is T 2 , which appears naturally in many situations where a lot of structure is available, and 
which has good tensorization properties in many dimensions. Up to now, T 2 inequalities 
have not been so well characterized: it is known that they are implied by a Logarithmic 
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Sobolev inequality [23 13 HO] , and that they imply a Poincare, or spectral gap, inequal¬ 
ity [23 Ej- IHl attempt to a criterion for T 2 . In any case, contrary to the case 

p = 1, there is no hope to obtain T 2 inequalities from just integrability or decay estimates. 
In this paper, we shall mainly focus on the case p = 1, which is much more flexible. 


1.2. Metric entropy. When X is a compact space, the minimum number m{X, r) of balls 
of radius r needed to cover X is called the metric entropy of X. This quantity plays 
an important role in quantitative variants of Sanov’s Theorem m Exercise 4.5.5]. In the 
present paper, to fix ideas we shall always be working in the particular Euclidean space 
which of course is not compact; and we shall reduce to the compact case by truncating 
everything to balls of finite radius R. This particular choice will influence the results 
through the function m{Vp{Bji) , r) , where Bji is the ball of radius R centered at some 
point, say the origin, and Vp{Bji) is the space of probability measures on Bji, metrized by 

Wp. 


1.3. Sanov-type theorems. The core of our estimates is based on variants of Sanov’s 
Theorem, all dealing with independent random variables. Let p be a given probability 
measure on R'’*, and let (X*)j=i,,, tv be a sample of independent variables, all distributed 
according to p; let also 



2=1 


be the associated empirical measure. In our first main result we assume a Tp inequality for 
the measure p, and deduce from that an upper bound in Wp distance: 


Theorem 1.1. Let p G [1,2] and let pi be a probability measure on R'’* satisfying a Tp(A) 
inequality. Then, for any d' > d and X' < X, there exists some constant Nq, depending 
on A', d' and some square-exponential moment of p, such that for any e > 0 and N > 
Xomax(£-('’''+2),l), 

P [lTp(p,p^) > e] < (1.1) 


where 


Xp 


1 if l<p<2 
3-2^2 z/p = 2 . 


Compared to Sanov’s Theorem, this result is more restrictive in the sense that it requires 
some extra assumptions on the reference measure p, but under these hypotheses we are 
able to replace a result which was only asymptotic by a pointwise upper bound on the 
error probability, together with a lower bound on the required size of the sample. 

In view of the Kantorovich-Rubinstein duality formula 

W,(ii, v)= sup I j fd{p.-v)\ ||/||lip<i|, 


(1.2) 
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Theorem II.11 implies concentration ineqnalities snch as 


P 


snp 

/; ll/llLip<i 


N „ 

f'iA> 

k=l 


<62 




for A' < A, and N sufficiently large, under the assumption that fi satishes a Ti inequality, 
or equivalently admits a hnite square-exponential moment. Those types of inequalities are 
of interest in non-parametric statistics and choice models 


Remark 1.2. The sole inequality Ti(A) implies that for all 1-Lipschitz function /, 


P 


N 

vE/w-y 

fc=i 


f dfi > e 


<62 




and it is easy to see that the coefficient A in this inequality is the best possible. While the 
quantity controlled in Theorem II.II is much stronger, the estimate is weakened only in that 
A is replaced by some A' > A (arbitrarily close to A) and that N has to be large enough. 
In fact, a variant of the proof below would yield estimates such as 

where now there is no restriction on N, but C{e) is a larger constant, explicitly computable 
from the proof. 

Remark 1.3. As pointed out to us by M. Ledoux, there is another way to concentration 
estimates on the empirical measure when d = p = 1. Indeed, in this specihc case, 

. N 




i=l 


Li(R) 


where H = l[o,+oo) stands for the Heaviside function on M and F denotes the repartition 
function of /i, so that 


P 




= P 


N 




i=l 


Li 


> 6 


where 

F, := H{- -X,)-F {l<t<N) 

are centered L^(M)-valued independent identically distributed random variables. But, ac¬ 
cording to lU Exercise 3.8.14], a centered L^(M)-valued random variable Y satishes a 
Central Limit Theorem if and only if 

f dt < -|-cx), 

Jr 

a condition which for the random variables Fj’s can be written 

[ - F{t)) dt < 4-00. 


(1.3) 
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Condition (Q in turn holds true as soon as (for instance) d^{x) is finite for some 

positive 6. Then we may apply a quantitative version of the Central Limit Theorem for 
random varaiables in the Banach space L^(R). See cni and HHI for related works. 


Remark 1.4. Theorem 11.11 applies if N is at least as large as £ for some r > d + 2; we 
do not know whether d + 2 here is optimal. 

For the applications that we shall treat, in which the tails of the probability distributions 
will be decaying very fast, Theorem 1 1.1 1 will be sufficient. However, it is worthwile pointing 
out that the technique works under much broader assumptions: weaker estimates can be 
proven for probability measures that do not decay fast enough to admit finite square- 
exponential moments. Here below are some such results using only polynomial moment 
estimates: 

Theorem 1.5. Let q > 1 and let fi be a probability measure on such that 

/ \x\^ dp,{x) < -|-cx). 

Then 

(i) For any p G [l,g/2), 5 G (0,g/p —2) and d' > d, there exists a constant Nq such that 

P >e\< 

2p-\-d^ jf , 

for any e > 0 and N > Nq max(e~'^ i-p , 

(a) For any p G [g/2,g), 8 G (0,g/p— 1) and d' > d there exists a constant Nq such that 

P >e]< 

_ 2p+d^ ,/ j 

for any e > 0 and > A^o niax(£~^ g-p , . 

Here are also some variants under alternative “regularity” assumptions: 

Theorem 1.6. (i) Let p > 1; assume that 8a '■= f e"l^ld/i is finite for some a > 0. 

Then, for all d' > d, there exist some constants K and Nq, depending only on d, a 
and 8a, such that 

P [Wp{fi,'fl^) >e]< 

for any e > 0 and N > max(e“’^^^+'^'\ 1). 

(ii) Suppose that fi satifies Ti and a Poincare inequality, then for all a <2 there exists 
some constants K and Nq such that 

P >e\< 

for any £ > 0 and N > max(£“*^^+'^'\ 1). 


(1.4) 
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(iii) Let p > 2 and let p be a probability measure on satisfying Tp(A). Then for all 
A' < A and d' > d there exists some constant Nq, depending on p only through A 
and some sguare-exponential moment, such that 


> e] < min 


(1.5) 


for any £ > 0 and N > Nq max(e: 1), 


1.4. Interacting systems of particles. We now consider a system of N interacting 
particles whose time-evolution is governed by the system of coupled stochastic differential 
equations 

1 ^ 

dXl = V2 dBl - VV{Xi)dt - ^ VW(X* - Xl)dt, * = 1,..., iV. (1.6) 

i=i 

Here XI is the position at time t of particule number i, the i?*’s are N independent 
Brownian motions, and V and W are smooth potentials, sufficiently nice that (HSl) can be 
solved globally in time. We shall always assume that W (which can be interpreted as an 
interaction potential) is a symmetric function, that is W{—z) = W{z) for all G 

Equation (USD is a particularly simple instance of coupled system; in the case when V 
is quadratic and W has cubic growth, it was used as a simple mean-field kinetic model 
for granular media (see e.g. mi). While many of our results could be extended to more 
general systems, that particular one will be quite enough for our exposition. 

To this system of particles is naturally associated the empirical measure, defined for each 
time t > 0 by 

N 

Tt •= (1-7) 

i=l 

Under suitable assumptions on the potentials V and W, it is a classical result that, if 
the initial positions of the particle system are distributed chaotically (for instance, if they 
are identically distributed, independent random variables), then the empirical measure fi^ 
converges as —>■ oo to a solution of the nonlinear partial differential equation 

^ = Ap, + V-(^PtV{V + W*pt)), (1.8) 

where V- stands for the divergence operator. Equation II.81 is a simple instance of McKean- 
Vlasov equation. This convergence result is part of the by now well-developed theory of 
propagation of chaos, and was studied by Sznitman for pedagogical reasons 123 . in the 
case of potentials that grow at most quadratically at infinity. Later, Benachour, Roynette, 
Talay and Vallois urn considered the case where the interaction potential grows faster 
than quadratically. As far as the limit equation (ll.8jl is concerned, a discussion of its use 
in the modelling of granular media in kinetic theory was performed by Benedetto, Caglioti, 
Carrillo and Pulvirenti min, while the asymptotic behavior in large time was studied by 
Carrillo, McCann and Villani inmni with the help of Wasserstein distances and entropy 
inequality methods. Then Malrieu na presented a detailed study of both limits t ^ oo 
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and iV —> cxo by probabilistic methods, and established estimates of the type of m nnder 
adeqnate convexity assnmptions on V and W (see also m Problem 15]). 


As announced before, we shall now give some estimates on the convergence at the level 
of the law itself. To £x ideas, we assume that V and W have locally bounded Hessian 
matrices satisfying 


(i) D^V{x) > /3J, 7 / < D^W{x) < yj, Vx e 

(ii) |VH(x)| = for any a > 0. 


(1.9) 


Under these assumptions, we shall derive the following bounds. 


Theorem 1.7. Let po be a probability measure on admitting a finite square-exponential 
moment: 


dcxo > Oj 




<+ 00 . 


Let (Xyi<j<jv be N independent random variables with common law /xq- Let (Xl) be the 
solution of (USD with initial value {Xq, ... Xq), where V and W are assumed to sat¬ 
isfy (USD; let fit be the solution of (USD with initial value fiQ. Let also be the 
empirical measure associated with the {Xl)i<i<N. Then, for all T > 0, there exists some 
constant K = K(T) such that, for any d' > d, there exists some constants Nq and C such 
that for all e > 0 


N>No max(£-('^'+2),l) 


P 


sup , fit) > e 

.0<t<T 


< C{1 + Te-^) exp {-K N e^) . 


Note that in the above theorem we have proven not only that for all t, the empirical mea¬ 
sure is close to the limit measure, but also that the probability of observing any significant 
deviation during a whole time period [0,T] is small. 

The fact that 'jlf is very close to the deterministic measure fit implies the propagation of 
chaos: two particles drawn from the system behave independently of each other as iV —>■ cx) 
(see Sznitman for more details). But we can also directly study correlations between 
particles and hnd more precise estimates: for that purpose it is convenient to consider the 
empirical measure on pairs of particles, dehned as 



1 

iV(iV-l) 




By a simple adaptation of the computations appearing in the proof of Theorem 11.71 one 
can prove 


Theorem 1.8. With the same notation and assumptions as in Theorem \1.7[ for allT > 0 
and d' > d, there exists some constants K > 0 and Nq such that for all e > 0 




P 


fUi(pf’^,/it ® pt) > e < exp {-K N . 
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(Here Wi stands for the Wasserstein distance or order 1 on Pi(R‘^ x Of course, one 
may similarly consider the problem of drawing k particles with k >2. 

Theorems o and oi use Theorem o as a crucial ingredient, which is why a strong 
integrability assumption is imposed on /xq- Note however that, under stronger assumptions 
on the behaviour at infinity of V or W, as the existence of some /3 G R, P, e: > 0 such as 

D^V{x) > {B\x\^ + (3)1, ,VxeR^ 

it can be proven that any square exponential moment for fit becomes instantaneously 
hnite for f > 0. Note also that, by using Theorem II.51 one can obtain weaker but still 
relevant results of concentration of the empirical measure under just polynomial moment 
assumptions on fiQ, provided that W does not grow too fast at inhnity. To limit the size 
of this paper, we shall not go further into such considerations. 

1.5. Uniform in time estimates. In the “uniformly convex case” when (3 > 0, / 3+27 > 0, 
it can be proven ng 0 cni that fit converges exponentially fast, as t —>• oo, to some 
equilibrium measure fioo- In that case, it is natural to expect that the empirical measure 
is a good approximation of fioo as N ^ oo and t oo, uniformly in time. This is what 
we shall indeed prove: 

Theorem 1.9. With the same notation and assumptions as in Theorem \1.7[ suppose that 
/3 > 0, /3 + 27 > 0. Then there exists some constant K > 0 such that for any d' > d, there 
exists some constants C and Nq such that for all e > 0 

N > No'max{e~^‘^'~^^\l) snpF , fit) > e] < C{1 + e~‘^) exp (—K N 

t>o 

As a consequence, there are constants Tq, Eq (depending on the initial datum) and K' = K/A 
such that, under the same conditions on N and e, 

sup ¥\Wi{jif, fioo) > s] < C{1 + e~'^) exp (—P'. 
t>To log(eo/e) 

Remark 1.10. In view of the results in jg, it is natural to expect that a similar conclusion 
holds true when U = 0 and W is convex enough. Propositions Id. II and Id.81 below extend 
to that case, but it seems trickier to adapt the proof of Proposition Id.81 

We conclude with an application to the numerical reconstruction of the invariant mea¬ 
sure. 


Theorem 1.11. With the same notation and assumptions as in Theorem M.fA consider the 
mollified empirical measure dnig). Then one can choose a = 0(e) in such a way that 


N > 7Vomax(£-('^'+2),i) 


sup P 

t>To log(£o/£) 


Wft-fo 


> e 


< ^(1 + £-(2'^+^)) exp {-K'N. 

These results are effective: all the constants therein can be estimated explicitly in terms 
of the data. 
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1.6. Strategy and plan. The strategy is rather systematic. First, we shall establish 
Sanov-type bounds for independent variables in (not depending on time), resulting in 
concentration results such as Theorems 11.11 to This will be achieved along the ideas 
in [121 Exercices 4.5.5 and 6.2.19] (see also |2S1 Section 5]), by hrst truncating to a compact 
ball, and then covering the set of probability measures on this ball by a hnite number of 
small balls (in the space of probability measures); the most tricky part will actually lie in 
the optimization of parameters. 

With such results in hand, we will start the study of the particle system by introducing 
the nonlinear partial differential equation m- For this equation, the Cauchy problem can 
be solved in a satisfactory way, in particular existence and uniqueness of a solution, which 
for f > 0 is reasonably smooth, can be shown under various assumptions on V and W (see 
e.g. iniiini)- Other regularity estimates such as the decay at inhnity, or the smoothness in 
time, can be established; also the convergence to equilibrium in large time can sometimes 
be proven. 

Next, following the presentation by Sznitman |2Z], we introduce a family of independent 
processes (F/)i<j<Ar, governed by the stochastic differential equation 

dY^^ = V2dBi-VV{Y;)dt-VW * 


■'o “ 


A'J, 


( 1 . 10 ) 


As a consequence of Ito’s formula, the law ut of each T)* is a solution of the linear partial 
differential equation 

= Aut + V ■ iy + W * fit) 5 ^0 = ho- 

But this linear equation is also solved by fit, and a uniqueness theorem implies that ac- 
tucilly Uf = for Sill t ^ 0. Sgg 12111 for related questions on the stochastic differential 
equation (fTTUD . 

For each given t, the independence of the variables Y)* and the good decay of fit will 
imply a strong concentration of the empirical measure 

N 




N A 


N 


i=l 


To go further, we shall establish a more precise information, such as a control on 


P 


sup fFi(z/f ,/it) > £ 

0<t<T 


Such bounds will be obtained by combining the estimate of concentration at hxed time t 
with some estimates of regularity of (and fit) in t, obtained via basic tools of stochastic 
differential calculus (in particular Doob’s inequality). 

Finally, we can show by a Gronwall-type argument that the control of the distance of 
to fit reduces to the control of the distance of to fit- for instance. 


sup Wiijlt , fit) > e 

< p 

sup Wiy^fit) > Ce 

_0<t<T 


_0<t<T 


( 1 . 11 ) 
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for some constant C. We shall also show how a variant of this compntation provides 
estimates of the type of those in Theorem ll.hL and how to get data reconstrnction estimates 
as in Theorem 11 .1 II 

1.7. Remarks and further developments. The results in this paper conhrm what 
seems to be a rather general rnle about Wasserstein distances: results in distance Wi 
are very robust and can be used in rather hard problems, with no particular structure; on 
the contrary, results in distance W 2 are stronger, but usually require much more structure 
and/or assumptions. For instance, in the study of the equation (ll.8|l . the distance W 2 
works beautifully, and this might be explained by the fact that has the structure of 
a gradient flow with respect to the W 2 distance inmni In the problem considered by Mal- 
rieu na, W 2 is also well-adapted, but leads him to impose strong assumptions on the initial 
datum fiQ, such as the existence of a Logarithmic Sobolev inequality for /xq, considered as 
a reference measure. As a general rule, in a context of geometric inequalities with more 
or less subtle isoperimetric content, related to Brenier’s transportation mapping theorem, 
W 2 is also the most natural distance to use jSH]- On the contrary, here we are considering 
quite a rough problem (concentration for the law of a random probability measure, driven 
by a stochastic differential equation with coupling) and we wish to impose only natural 
integrability conditions; then the distance Wi is much more convenient. 

Further developments could be considered. For instance, one may desire to prove some 
deviation inequalities for dependent sequences, say Markov chains, as both Sanov’s the¬ 
orem and transportation inequality can be established under appropriate ergodicity and 
integrability conditions. 

Considering again the problem of the particle system, in a numerical context, one may 
wish to take into account the numerical errors associated with the time-discretization of the 
dynamics (say an implicit Euler scheme). For concentration estimates in one observable, a 
beautiful study of these issues was performed by Malrieu m- For concentration estimates 
on the whole empirical measure, to our knowledge the study remains to be done. Also 
errors due to the boundedness of the phase space actually used in the simulation might be 
taken into account, etc. 

At a more technical level, it would be desirable to relax the assumption of boundedness 
of D^W in Theorem 1 1.71 so as to allow for instance the interesting case of cubic interaction. 
This is much more technical and will be considered in a separate work. 

Another issue of interest would be to consider concentration of the empirical measure 
on path space, i.e. 



i=l 


where T is a fixed time length. Here is a random measure on (^([O,T];M'^) and we 
would like to show that it is close to the law of the trajectories of the nonlinear stochastic 
differential equation 


dY, = %/2dBt- VV(Y,) dt - (VH^ * lii)(Y,) dt, 


(1.12) 




13 


where the initial datum Yq is drawn randomly according to /iq- This will imply a quanti¬ 
tative information on the whole trajectory of a given particle in the system. 

When one wishes to adapt the general method to this question, a problem immediately 
occurs: not only is C([0,T];M'^) not compact, but also balls with hnite radius in this 
space are not compact either (of course, this is true even if the phase space of particles is 
compact). One may remedy to this problem by embedding O([0, T]; Br) into a space such 
as L‘^{[0,T]; Br), equipped with the weak topology; but we do not know of any “natural” 
metric on that space. There is (at least) another way out: we know from classical stochastic 
processes theory that integral trajectories of differential equations driven by white noise 
are typically Holder-a for any a < 1/2. This suggests a natural strategy: choose any hxed 
a G (0,1/2) and work in the space 7-f“([0, T]; M'^), equipped with the norm 

II II I / \i ~ '*^('S)| 

||ta||-^a := sup |tc(t)|-|-sup-j-j-. 

o<t<T s^t — Sr 

For any i? > 0, the ball of radius R and center 0 (the zero function) in 7Y" is compact, 
and one may estimate its metric entropy. Then one can hope to perform all estimates by 
using the norm for instance, establish a bound on, say, a square-exponential moment 
on the law of Yj: 

Eexp (/3||(l/)o<t<r||H“) < +oo- 

Again, to avoid expanding the size of the present paper too much, these issues will be 
addressed separately. 


2. The case of independent variables 

In this section we consider the case where we are given N independent variables X* G M'^, 
distributed according to a certain law /i. There is no time dependence at this stage. We 
shall hrst examine the case when the law pi has very fast decay f Theorem 1 1.1|1 . then variants 
in which it decays in a slower way fTheorem 11.51 and ll.f)|l . 

2.1. Proof of Theorem urn The proof splits into three steps: (1) Truncation to a com¬ 
pact ball Br of radius R, (2) covering of V{Br) by small balls of radius r and Sanov’s 
argument, and (3) optimization of the parameters. 

Step 1: Truncation. Let i? > 0, to be chosen later on, and let Br stand for the ball 
of radius R and center 0 (say) in Let 1^^ stand for the indicator function of Br. We 
truncate pi into a probability measure piR on the ball Br: 

_ 

We wish to bound the quantity P [Wp{p^, pi) > e~\ in terms of piR and the associated 
empirical measure. For this purpose, consider independent variables {X^)i<k<N drawn 
according to pi, and (T^)i<fc<Ar drawn according to piR, independent of each other; then 
dehne 

, _ r if ix^l <i? 

^R- I yfe jf 
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Since and are distributed according to n and fiR respectively, we have, by defi¬ 
nition of Wasserstein distance. 


< E|A'‘ - Ail? = e(|A'> - A‘|n|x.|>fl) < 2>’E(|Ayi|.v.|>R) 


= 2P 




\x\^ d^{x). 


But n satisfies a Tp(A) inequality for some p > 1, hence a fortiori a Ti(A) inequality, so 

:= [ dfxix) < +oo 

for some a > 0 (any a < A/2 would do). If R is large enough (say, R > a/p/(2q;)), then 


the function r 


is nonincreasing for r > R, and then 




RP 

aaR^ 


l{\x\>R} 


e“l^l'dp(a;). 


We conclude that 


WRp, Pr) < (a < A/2, R > v'p/Sl). 


( 2 . 1 ) 


On the other hand, the empirical measures 


N 


: = 


N 




V'R ■ = 


k=l 


1 ^ 
k=l 


satisfy 


N N 

r)<xY. ^ E 


k=l k=l 

where := 2^^ l|xfc|>_R (k = Then, for any p G [1,2], we can introduce 

parameters £ and 6 > 0, and use Chebyshev’s exponential inequality and the independence 
of the variables to obtain 




N 


7k ^ 


= P 


—y 

N ^ 

k=l 
N 

expy 0(Z^ > 1 


k=l 

N 


< E expy 0(Z'= 


k=l 


= exp(-iV[0eP-logEexp(0Zi)]). 


(2.2) 
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In the case when p < 2, for any «!<«<—, there exists some constant Rq = Ro{ai,p) 
such that 

2P9rP < air^ + C, 

1 

for all 0 > 0 and r > Rq6^-p, whence 

Eexp{eZi) < Eexp(ai |Xi|2 1\x^\>r) < 1 + 

As a consequence, 

P < exp (^-N OeP- ) . (2.3) 

From (EH), (El and the triangular inequality for Wp, 

P [Wp{p, > e] < P [Wpip, PR) + Wp{pR, pI) + Wp{pl^ p^) > e] 

< P \^p{pR,pl) >pe- 2El/PRe-f^"] + P [Wp{p^,p^) > (1 - p)e] 

< P [Wp{pR,p^) >pe- 2El/PRe-f^" 

+ exp(^-iv(0(l-77)%P-E„e("i-")^')) . (2.4) 

This estimate was established for any given p G [1, 2), r; G (0,1), e, 6 * > 0, Oi < a < -^ and 

R > max ^A/p/2a, Rq9^^, where Rq is a constant depending only on ai and p. 

In the case when p = 2, we let := \Yk — l|Xj.|>ij {k = 1,..., N), and starting 
from inequality (El again, we choose ai < a and then 9 ■= ai/2\ by dehnition of Zi and 


(exp ( y^i 


'R 2 d 


exp (^y Ip - dp{x) dpR^y) 


— p[Br] + 


p[B_ 


R\ J\y\<RJ\x\>R 


exp 


tti 


Ip-xH dp{x)dp{y) 


-aR?\-l 


'\y\<R 


e“il^l'dp(p) 


|x|>ij 


< l + (l-E„e-“") 

for R large enough, from which 

F[W 2 {pr,p^) > e] <exp(^-X y _ 2 ). 
To sum up, in the case p = 2 equation (El writes 

P [M4(d,P'') > £)) < p > R£ - 2E'fRe-i”" 




(2.5) 


+ exp (-iV (|-(1 - ,,)=£= 


2Ete 


2 Ja^-a) 


( 2 . 6 ) 
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So, apart from some error terms, for all p G [1, 2] we have reduced the initial problem to 
establishing the result only for the probability law /ir, whose support lies in the compact 
set Br. 

We end up this truncation procedure by proving that hr satisfies some modified Tp 
inequality. Let indeed z/ be a probability measure on Br, absolutely continuous with 
respect to fi (and hence with respect to /i/j); then, when R is larger than some constant 
depending only on Ea, we can write 

iL(z/|pR) - iL(z/|p) = / \og-^du- log ^ dz/= logp[5i?] 

Jbh “hi? Jbh “h 

> log (l - 

> -2E^e-^^\ (2.7) 

But fi satisfies a Tp(A) inequality, so 

H{ii\fi) > - W^ifi, > 2 - Wp{fiR,fi)^ 

by triangular inequality. Combining this with (EID, we obtain 

HiulfiR) > - {WpifiR, li) - WpifiR, fi)) - 2E^ 

From this, inequality (EH) and the elementary inequality 

VaG(0,1) aCa > 0; Vx, i/G R, {x - y)"^ > {I - a) - CaV^, (2.8) 

we deduce that for any Ai < A there exists some constant K such that 

H{ii\fiR) > ^Wp{fiR,iif - KR^e-^^\ (2.9) 


Step 2: Covering by small balls. In this second step we derive quantitative estimates 
on Let 0 be a bounded continuous function on R'^, and let B be a Borel set in 

V{Br) (equipped with the weak topology of convergence against bounded continuous test 
functions). By Chebyshev’s exponential inequality and the independence of the variables 

X k 
Ri 


G B] < exp ( -N inf 


Br 


(pdii \ E ( ^ 


= exp 


-N inf 


0 dv 


-logE 


>Br 


= exp —N inf 


N 

1 


'Br 


(f)dv — — logE 


= exp 


-N inf 

u&B 


(pdii — log / R^dfi 


R 


'Br. 
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As cj) is arbitrary, we can pass to the supremum and find 


e B] < exp ( -N sup inf 


(pdf — log / d/iR 






Now we note that the quantity J pdf — log J dfiR is linear in u and convex lower semi- 
continuous (with respect to the topology of uniform convergence) in 0 ; if we further assume 
that B is convex and compact, then (for instance) Sion’s min-max theorem Theorem 
4.2’1 ensures that 


sup inf 

Ubi 


pdf — log / e'^d^R 


= inf sup 

4>&Ct,{Kd) VJbi 


pdf — log / e'^d^R 


By the dual formulation of the H functional [T^ Lemma 6.2.13], we conclude that 

P[/i^ ^ B] < exp inf iL(z/|/iii;)^ • (2.10) 

Now, let 5 > 0 and let ^ be a measurable subset of V{Br). We cover the latter with Af-^ 
balls (Si)i<i<A/'-^ with radius S/2 in Wp metric. Each of these balls is convex and compact, 
and it is included in the 5-thickening of A in Wp metric, defined as 


So, by ()2.10|1 we get 


A5-.= \ f e V{Br)] 3fa^A, Wp{f, fa) < S 


< p 




2 = 1 


< 


5 ^ p e s.) 

i=l 

< 5^exp(-iV mf^i7(i/|/i«) 
2 = 1 

< exp 


-N inf H{f\iiR) 

We now apply this estimate with 

A-.= [f^ V{Br)- Wp{f,^iR) >r]e- 2El/PRe-T^" 
From jzi) we have, for any f ^ As, 


( 2 . 11 ) 


Ai 


HiflfiR) > -j-Wp{f,iiR)^-KR^e 


2-aB? 


Ai 


> —- KK^e 
~ 2 


2-aB? 


where 


p := max \pe 


- 2E]!^Re~'^^^ 


5,0 . 
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Combining this with (EUD, we conclude that 


P 




< exp 





( 2 . 12 ) 


Now, given any A 2 < Ai, it follows from (ESI) that there exist (5i, r]i and Ki, depending 
on a, Ai, A 2 , such that 

(2.13) 

where 6 := 6ie and rj := r/i. 

Though this inequality holds independently of p, we shall use it only in the case when 
p <2. In the case p = 2, on the other hand, we note that for any p G (0,1), 


Ai 


2 - ^'-2.2 


\ A 2 ^ ^ 




(2.14) 


where 5 := die. 

Finally, we bound by means of Theorem lA.ll in Appendix ^ there exists some 
constant C (only depending on d) such that for all i? > 0 and 5 > 0 the set V{Br) can be 
covered by 


R A 
C-Vl 


balls of radius 6 in Wp metric, where aV6 stands for max(a, b). In particular, given 6 = 5i£, 
we can choose 


Af^< 



(x^tY 


(2.15) 


balls of radius S, for some constant K 2 depending on Ai and A 2 (via (5i) but neither on e 
nor on R. (The purpose of the 1 in {K 2 R/e V 1) is to make sure that the estimate is also 
valid when e > R.) 


Combining ()2.4|) . (I2.12|) . ()2.13|) and (ITT311 . we find that, given p G [1,2), A 2 < A and 
Q!i < a < —, there exist some constants Ki, K 2 , and Ri such that for all e, C > 0 and 
R> Ri max(l, C^), 


R 

P [Wp{p, > e] < ( K 2 — V 1 j exp ( -N 


exp 


lid - 


-Af (ii'aCE'’- Ifie'”*"”*®’)) (2.16) 













19 


for some constant i ^4 = ai). In the case when p = 2, we obtain similarly 




R 






exp ^—N 
+ exp (-Af (y(l 


(2.17) 


for any p G (0,1) and R> Ri. 

These estimates are not really appealing (!), bnt they are rather precise and general. In 
the rest of the section we shall show that an adeqnate choice of R leads to a simplihed 
expression. 


Step 3: Choice of the parameters. 

We hrst consider the case when p G [1,2). Let A' < A 2 , a' < a and di > d. We claim 
that 

P >e]< exp + exp {-aNe^) 

as soon as 

>/? 2 max ^l,£^,log , N> K^R!^^ (2.18) 

for some constants R 2 and depending on p only throngh A, a and E^. 

Indeed, on one hand 


Ko 


R 


log ( K 2 — 1 < Kq 


iT 


for some constant Kq, on the other hand 




for R large enongh, and then 

Ke 


1 

1 


\e J 

L 2 J 




< -N 


X'e 


for R?/ log(^) and N jR!^^ large enongh; this is enongh to bonnd the first term in the 
right-hand side of if moreover R/e is large enongh. 

Moreover, letting 0^2 G (a', cti), we can choose C in snch a way that = 6 ^“^, so that 


exp 


(-N = exp (-N (a 2 e^ - , 


which in the end can be bonnded by 


exp {—N ae^) 

if R and R^/ log(^) are large enongh. With this one can get a bonnd on the right-hand 
side of dnni). 
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Now let us check that conditions (ITTHl) can indeed be fulhlled. Clearly, the hrst condition 
holds true for all e G (0,1) and > R 3 log(^), where R 3 and Kq are positive constants. 
Then, we can choose 

so that the second condition holds as an equality. This choice is admissible as soon as 

>i?3logf^ 


(2.19) 


.ll'C / 

and this, in turn, holds true as soon as 

N>K 7 

where d' is such that d' > d, and K 7 is large enough. 

If £ > 1 , then we can choose R^ = be. R = y/^e, and then the second inequality 

in (ITTHl) will be true as soon as N is large enough. 

To sum up: Given d' > d, \' < \ and a' < a, there exists some constant Nq, depending 
on d' and depending on fi only through A, a and Ea, such that for all e > 0, 

P [Wp{n,'fl^) >e]< exp +exp{-aNe'^) 

as soon as N > TVq max(e:“^'^'+^\ 1). Then we note that, given K < min 
inequality 

exp +exp (-a'iVe^) < exp (-ilT iVe^) 

holds if condition is satished for some K 7 large enough. To conclude the proof of 

Theorem o in the case when p G [1, 2 ), it is sufficient to choose A' < A, a < A/ 2 . 

Now, in the case when p = 2, given A 3 < A 2 and 0:2 < Oi, conditions (ITTRll imply 
F[W 2 {fi,fl^)>e] <exp + exp - pf Ne^Y 

Then we let 0:2 := ^ and p := a/2 — 1, so that 




Then 


F[W 2 {p,fl^)> e] <2exp(^-{3-2V2)^Ne^^ ; 
for A' < A, the above quantity is bounded by 


A' 
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as soon as is enforced with K-j large enough. This concludes the argument. 


2.2. Proof of Theorem 11.51 It is very similar to the proof of Theorem 11.11 so we shall 
only explain where the differences lie. Obviously, the main difficulty will consist in the 
control of tails. 

q 

We hrst let p G [1, O'), a G [1, -) and i? > 0, and introduce 


M,:= 

Then (I2.1|l may be replaced by 


djji^x). 




and (Q by 


PlW,(n^,n^)>e]<CN- 

for some constant C depending on a and Mg. 

Let us establish for instance Introduce 

Zk = |Lfc — Xkf 

By Chebychev’s inequality, 


J^ap-q 

(eP - C RP-<iY 

{l<k< N). 


( 2 . 20 ) 

( 2 . 21 ) 


P > e] < P 


N 




k=l 


= p 


N 




k=l 


E 


< 




(N(er-¥.Zi)Y 


provided that sP > EZi. But, since the random variables {Zk — ¥.Zk)k are independent 
and identically distributed, with zero mean, there exists some constant C depending on a 
such that 


E 


Y,{Zk-^Zk) 

k=l 


< OA^^ElZi -EZi|“ 


where a := max(Q;/2,1). This inequality is a consequence of Rosenthal’s inequality in the 
case when a > 2, but also holds true if a G [1,2) (see for instance [211 pp. 62 and 82]). 
Then, on one hand, 

E Zi = E IW - wr Mx,\>r < 


while on the other hand. 


E|Zi - EZir = E IIW - Xil^ l\x,\>R - EIW - Xil^ 1\x,\>r\ 

< CE|W - Mx^\>r < CMgR^P-^ 

with C standing for various constants. Collecting these two estimates, we conclude to the 
validity of for P eP large enough. 
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Then fj2.2()j] and ()2.21|1 together ensure that 

> £] < P >7^8- 2MyPR^-‘^/P] + CN^ 


^ap-q 


((1 - riYeP - CRp-^Y 

( 2 . 22 ) 


for any e G (0, 1), rj > 0 and R‘^~p (1 — ri)P large enough. 

Since fiR is supported in Br, the Csiszar-Kullback-Pinsker inequality and Kantorovich- 
Rubinstein formulation of the Wi distance together ensure that it satishes a Ti{R~Y in¬ 
equality (see e.g. [SI Particular Case 5] with p = 1 ). This estimate also extends to any Wp 
distance, not as a penalized Tp inequality as in (ESD, but rather as 

m) < 2^P-^R^PH{iy\pR) (2.23) 

(see again |Sl Particular Case 5]). 

From and we deduce (as in (ITT7IP that 


F[Wp{fl^,p)>e] < (K, 


R 


Ki 


exp 






+CN^ 


J^ap-q 


((1 - p)PeP -CRP-^^Y 

(2.24) 


for any 6 , where now 

p := {pe - - 6 )^ . 

Letting pi < p and d' > d, and choosing 6 = SqE, we deduce 

d! 


P [{Wp{p^,p) > e]< exp 


hi 


Ne'^P Ki N 


+ 


22p-i ]^2p 22p-i R?q 


\+CN^ 


jpoip-q 


((1 - pi)pep -CRP-iy 


for R'^ P E^ {1 — piY large enough, and then 


P [Wp{p^, p) > e] < exp 
for P 2 < Pi, provided that the conditions 

R > Rie~'^, 


V 2 


2 P 


22 p-l 


+ C'A^'' 


J^ap-q 


(1 - P2YPE^P 


' p\ 2 p+d' 

N>K2{-\ 


(2.25) 


(2.26) 


hold for some Ri and K 2 . 

Given any choice of i? as a product of powers of N and e, the hrst term in the right-hand 
side of will always be smaller than the second one, if N goes to inhnity while e is 

kept hxed; thus we can choose R minimizing the second term under the above conditions. 
Then the second condition in ()2.26|1 will be fulhlled as an equality: 

R = K^e N 2 p+d'. 

As for the hrst condition in (j2.26|l . it can be rewritten as 

2p+d' 


N> No E 


-q- 


q-p 
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and then, by 


P [Wp{'jl^, fx) > e] < exp 




+ Kq e-^ Ar“-“+ 


ctp — q 
2p+d' _ 


Hence 

P [Wpili^, fi)>e]< £-9 (2.27) 

for all £ G (0, 1) and N larger than some constant and, given d' > d, for all £ > 1 and 
N > Me'^ where M is large enough. 

In the hrst case when p > g/2, any admissible a belongs to [1, g/p) C [1, 2], so a = 1. If 
5 G (0, q/p — 1), we get from ()2.27j) . with a = q/p — 6, that 


P 


Wp(/u^\p)>e 




for all £ > 0 and 

N > No max(£~^ <i-p , . 

In the second case when p < q/2, we only consider admissible a’s in [2,g/p) C [l,g/p), 
so that a — a = —aj^. Choosing d G (0, q/p — 2), we get from ()2.27|1 


P 


Wp{p^,p) 


> £ 




under the same conditions on N as before. This concludes the argument. 


2.3. Proof of Theorem nn It is again based on the same principles as the proofs of 
Theorems o and m with the help of functional inequalities investigated in jH] and HU. 
We skip the argument, which the reader can easily reconstruct by following the same lines 
as above. 


2.4. Data reconstruction estimates. Finally, we show how the above concentration 
estimates imply data reconstruction estimates. This is a rather general estimate, which is 
treated here along the lines of Section 5] and [23 Problem 10]. 

Proposition 2.1. Let p be a probability measure on with density f with respect to 

Lebesgue measure. Let Xi,... ,Xisf be random points in and let C, be a Lipschitz, non¬ 

negative kernel with unit integral. Define the random measure p and the random function 

k,ci by 

N N 

2=1 2 = 1 

Then, 

sup \kAx) - f{x)\ < p) + 6{a), 

where 5 stands for the modulus of continuity of f, defined as 

6 {e) := sup \f{x)-f{y)\. 

\x-y\<e 


(2.28) 
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As a consequence, if f is Lipschitz, then there exist some constants a,K > 0, only 
depending on d, H/Hup cmd HCIlLip? such that 


P 


\\fc,ae - fh- >e <F 


for all £ > 0. 
Proof. First, 


\fr*Ca{x) -/(x)| = 


Ca{x-y) {f{y) - f{x)) dy 


< / Ca{x-y)\f{y)- f{x)\dy. 


(2.29) 


Since (a{x — y) is supported in {|a; — i/| < a], and Cq, is a probability density, we deduce 

\ia*U{.x) - f{x)\<5{a). (2.30) 

Now, if X is some point in then, thanks to the Kantorovich-Rubinstein dual formu¬ 
lation (jl.2|l . 


/c,a - P*Ca (a:) 


.X = 


Ca{x - y) d[i2 - p]{y) 


< llCa(a: - •)l|LiphFi(/i,/i) 

_ IICIlLipi 


a 


d+l 


-lFi(/i,/x). 


To conclude the proof of (Em, it suffices to combine this bound with (Em- 
Now, let L := max(||/||Lip, HCIIrip), and a := e/{2L). The bound (I2.28|) turns into 


ll/<.» - /lU- < + a) < " 


a 




In particular. 


P 


II/o 


which is estimate (Em- 


/||l°° > £ 


< p 




^d+2 

{2LY+^\ ’ 


□ 


Remark 2.2. Estimate ()2.29|1 . combined with Theorem o or Theorem II.5L yields simple 
quantitative (non-asymptotic) deviation inequalities for empirical distribution functions 
in supremum norm. We refer to Gao UEI for a recent study of deviation inequalities for 
empirical distribution functions, both in moderate and large deviations regimes. 
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3. PDE ESTIMATES 

Now we start the study of our model system for interacting particles. The hrst step 
towards our proof of Theorem 11.71 consists in deriving suitable a priori estimates on the 
solution to the nonlinear limit partial differential equation m- In this section, we recall 
some estimates which have already been established by various authors, and derive some 
new ones. All estimates will be effective. 


3.1. Notation. In the sequel, /xo is a probability measure, taken as an initial datum for 
equation m, and various regularity assumptions will later be made on /xq. Assump¬ 
tions (ESI) will always be made on V and W, even if they are not recalled explicitly; we 
shall only mention additional regularity assumptions, when used in our estimates. More¬ 
over, we shall write 

T := max(| 7 |, lyi). (3.1) 


The notation fit will always stand for the solution (unique under our assumptions) of (II. 8 j) . 
We also write 


e(f) := 



for the (kinetic) energy associated with fit, and 


Mo.it) : = 



dfitix) 


for the square exponential moment of order a. 

The scalar product between two vectors u, tc G will be denoted hj v -w. The symbols 
C and K will often be used to denote various positive constants; in general what will matter 
is an upper bound on constants denoted C, and a lower bound on constants denoted K. 
The space is the space of k times differentiable continuous functions. 


3.2. Decay at infinity. In this subsection, we prove the propagation of strong decay 
estimates at inhnity: 


Proposition 3.1. With the conventions of Subsection Ur7[ let rf be —7 z /7 < 0, and an 
arbitrary negative number otherwise. Let 


a := 2ip+ rf), 


G 


2d + 


|V^(0)P 

2|hl 


Then 


(i) e(t) < e 


— at 


^ e 


at 


eiO) + G 


(a) For any oq > 0 there is a continuous positive function ait) such that a(0) = oq and 


Maoif^) < +00 


Ma(t)it) < + 00 . 


(3.2) 
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(in) Moreover, in the “uniformly convex case” when (3 > 0 and /? + 7 > 0, then there is 
a > 0 such that 

sup e(t) < +C)0, sup Ma(t) < +CX). 
t>o t>o 

Corollary 3.2. ///iq admits a finite square exponential moment, then Ht satisfies Ti{Xt), 
for some function A* > 0, bounded below on any interval [0,T] (T < 00 ). 

Proof. We start with (i). For simplicity we shall pretend that is a smoothly differentiable 
function of t, with rapid decay, so that all computations based on integrating equation (EHD 
against \x\^ are justified. These assumptions are not a priori satisfied, but the resulting 
bounds can easily be rigorously justified with standard but tedious approximation argu¬ 
ments. With that in mind, we compute 


e {t) = 2 d — 2 {x ■ W(x) + X ■ VW * jafix)) djafix) 


with 


—2 / X ■ W{x)djj,t{x) < —2(3 / \x\^ dufix) — 2WfiS) ■ / xd/ifix). 

JRd Jffid 

Since VW is an odd function, we have 

-2 X ■ VW * fifix) dfifix) = -2 X ■ VW{x - y) dfifiy) dfifix) 


{x -y) ■ VW{x - y) d^fiy) d/xfix) 
< --f 11 \x - y\‘^ dy,t{y) dfifix) 


= -27 


\x\^ dfifix) 


If 7 < 0, then 

e {t) < 2 d — 2(7-|-/3)e(t)-|-2 7 


xdp,t{x) -|- 


Vl/(0) 


2|tI 


xdyLt{x) 


|vr(o)r- 


< 2d — 2(7 -1- (3)e(t) -p 


|VC(0)P 


and if 7 > 0, then for any 7 < 0 
e'(t) < 2 d — 2(77 + /?)e(i) — 27 

|V1/(0)|2 


X djj,t{x] 


+ 


|VC0)l= 

2 m 


< 2d — 2(7 -p (3)e(t) -p 


This leads to 


e'{t) < G — a e{t), 
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and the conclusion follows easily by Gronwall’s lemma. 

We now turn to (ii). Let a be some arbitrary nonnegative function on R_|_. By using 
the equation we compute 


d 

dt 




a+Aa'^lxl 


2ax-W{x)—2ax-\/Wd^t{x). 


Since D‘^V{x) > jSI for all x G we can write 

-x ■ Vl/(a:) < -X ■ Vl/(0) - /3|xp < -/3|xp + |Vl/(0)||x| < {5 - I3)\x\^ + ^ (3.3) 

4 0 

for any 5 > 0 and x G 

Next, our assumptions on W imply VhL(O) = 0, and 7 / < D‘^W{x) < 7 '/, so 
X ■ Viy(x) > 7 |xp and \x ■ D‘^W{z)y\ <T\x\\y\ 
for all x,y, z ^ with L dehned by (ED). Hence, by Taylor’s formula. 


— X ■ VW * yt{x) = — X ■ Vhh(x — y) dyt{y) 


= —X ■ VW(x) + 



R'* Jo 


X ■ D‘^W{x — sy) ydytiy) ds 


< -7|xp + r|x| / \y\dyt{y) 


< (-7 + r 7 )|xp + — e(f). 


4 T] 


where rj is any positive number. 
From (El and (El we obtain 


4(m„,„(())< [ [A(«) + B(()|i|7“i‘«'Pdf„(i) 

at\ / Jmd 


where 


A{t) = Ca{t) (1 + e(f)), B{t) = a'{t) + 4a(t)^ + ba{t), 

and G is a hnite constant, while b = — 2(7 + f5 — 6 — Tr]). 

We now choose a{t) in such a way that B{t) = 0, i.e. 

a'{t) + Aa‘^{t) + ba{t) = 0, Q!( 0 ) = Oq. 

This integrates to 


a{t) = e 


-bt / + 4 


-bt\ -1 


Cto 


ao 


+ 4f 


-1 


if 6 = 0 


Obviously a is a continuous positive function, and our estimates imply 

d 




(3.4) 


(3.6) 
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We conclude by using Gronwall’s lemma that 


< exp / y4(s) ds M„o(0). 


Next, the estimate (iii) for e{t) is an easy consequence of our explicit estimates when 
/3 > 0, /3 + 7 > 0 (in the case when 7 > 0 and /3 > 0, we choose rj e (0, (3)). 

As for the estimate about Ma{t), it will result from a slightly more precise computation. 
From ()3.5|) . we have 


A 

dt 


dfxt{x) < / [A{t) + dfxtix) 


(3.6) 


where A is bounded on M_|_ by some constant a, and 

B = 2a [2a — {(3 + j — S — F^)]. 

/ /d + 7\ 

Since /? + 7 > 0, for any hxed a in I 0, —-— we can choose 5, 7 > 0 such that B < 0. 


(3.7) 


Letting = —a/B and G = —B > 0, equation (13.fill becomes 
d 


dt 


d^tix) < G / {R^ — 1x1“^) dfitix). 


Let p > 1. The formula 




= R\l-p‘^ 


|a;|>pR 


A 




leads to 


/ {R^ — \x\‘^) dfit{x) < / {R^p^ — \x\‘^) dpt{x) + R^{1 — p‘^)Ma. 

Jm.’^ J\x\<pR 

by decomposing the integral on the sets {|x| < pi?} and {|a;| > pi?}. From (I3.7|l we deduce 

{Ala)'it) + AIa{t) < LO2 

where uji and uj 2 are positive constants. It follows that Ma{t) remains bounded on M+ if 
AIa{tAj < +C)0, and this concludes the argument. □ 


3.3. Time-regularity. Now we study the time-regularity of pf 

Proposition 3.3. With the conventions of Subsection l,V. il for any T < -fcx) there exists 
a constant G (T) such that 

Vs,f e [0,T], Wr{pt,h^s) < G{T)\t-s\^/\ (3.8) 
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Remark 3.4. The exponent 1/2 is natural in small time if no regularity assumption is 
made on /xq; it can be improved if t, s are assumed to be bounded below by some to > 0 . 
Also, in view of the results of convergence to equilibrium recalled later on, the constant 
C{T) might be chosen independent of T if /3 > 0, /3 + 2 7 > 0. 

Remark 3.5. A stochastic proof of (13 .81) is possible, via the study of continuity estimates 
for Yt, which in any case will be useful later on. But here we prefer to present an analytical 
proof, to stress the fact that estimates in this section are purely analytical statements. 

Proof. Let L be the linear operator —A — V ■ (-Vld + WfW * /i*)), and let be the 
associated semigroup: from our assumptions and estimates it follows that it is well-defined, 
at least for initial data which admit a hnite square exponential moment. Of course fit = 
It follows that 

II^2(hs,/Wi) = = ^ 2(1 6ydfis{y), [ 

< [ W2{5y,e-^^-^^^5y)dfrs{y). 

jRd 


Our goal is to bound this by 0{^/t — s). In view of Proposition EUl it is sufficient to prove 
that for all a > 0 , 

Wi{5y, e-^^-^^^6y) = 0{t - s) 

This estimate is rather easy, since the left-hand side is just the variance of the solution 
of a linear diffusion equation, starting with a Dirac mass at y as initial datum. Without 
loss of generality, we assume s = 0, and write Jit := e~^^6y. For simplicity we write the 
computations in a sketchy way, but they are not hard to justify. 

Since the initial datum is 6y, its square exponential moment Ma of order a is . With 
an argument similar to the proof of Proposition 13. If hi. one can show that 

0 < t < T =» [ djltix) < C{T){1 + M^) < C{T) 


Now, since |VD|(a;) = a < a, \VW * p/ grows at most polynomially, and fit 


admits a square exponential moment of order a, we easily obtain 




V(I/ + W * fit) djlt = I 0(e“l"l') djlt = 




A J\^djlt = d- J x-ViV + W*fit)dJlt = 0{e^\^\"). 

From these estimates we deduce that the time-derivative of the variance V(Jit) := f \x\‘^ djlt— 
(f xdjlt)'^ is bounded by for any 6 > 0. Since Jlo has zero variance, it follows that 

the variance of Jit is ), which was our goal. □ 
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3.4. Regularity in phase space. Regularity estimates will be useful for Theorem 11.111 
Equation is a (weakly nonlinear) parabolic equation, for which regularization effects 
can be studied by standard tools. Some limits to the strength of the regularization are 
imposed by the regularity of V. So as not to be bothered by these nonessential consider¬ 
ations, we shall assume strong regularity conditions on V here. Then in Appendix |B] we 
shall prove the following estimates: 

Proposition 3.6. With the conventions of Subsection Mh 11 assume in addition that V has 
all its derivatives growing at most polynomially at infinity. Then, for each k > 0 and for 
all to > 0, T > to there is a finite constant C(to, T), only depending on to, T, k and a sguare 
exponential moment of the initial measure fio, such that the density ft of pit is of class W, 
with 

sup ||/t||c'= < <^(^0,7^)- 

to<t<T 

If moreover /? > 0, /? -|- 7 > 0, then C(to, T) can be chosen to he independent ofT for any 
fixed to- 

Remark 3.7. For regular initial data and under some adequate assumptions on V and 
W, some regularity estimates on ft/foe, where f^c is the limit density in large time, are 
established in uni Lemma 6.7]. These estimates allow a much more precise uniform decay, 
but are limited to just one derivative. Here there will be no need for them. 

3.5. Asymptotic behavior. In the “uniformly convex” case when ft + 'j > 0, the measure 
p.t converges to a dehnite limit /loo as t ^ 00 . This was investigated in uniEllin!- The 
following statement is a simple variant of 0 Theorems 2.1 and 5.1]. 

Proposition 3.8. With the conventions of Subsection \d.l\ assuming that ft > 0, (1+2'y > 0, 
there exists a probability measure /loo such that 

W 2 (p-t,P‘oo) < Ce , A > 0. 

Here the constants C and X only depend on the initial datum /ig. 


4. The limit empirical measure 


Consider the random time-dependent measure 


pf : = 


N 


N ^ 

i=l 




(4.1) 


where {Y/)t>o, I Y i < N, are N independent processes solving the same stochastic 
differential equation 

dY/ = y/2dBl - [V(l/ + IT * tit)](Y/) dt, 

and such that the law of Yf is /iq. As we already mentioned, for each t and i, Yf is 
distributed according to the law p.f We call d/f the “limit empirical measure” because it 
is expected to be a rather accurate description, in some well-chosen sense, of the empirical 
measure fi^ as A — > cx). 
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Our estimates on /r^, and the fact that is the empirical measure for independent 
processes, are sufficient to imply good properties of concentration of around its mean 
/it, as N ^ oo, for each t. But later on we shall use some estimates about the time- 
dependent measure (even to obtain a result of concentration for /xf^ with hxed t). To get 
such results, we shall study the time-regularity of . Our dual goal in this section is the 
following 


Proposition 4.1. With the conventions of Subsection [Ql for any T > 0 there are con¬ 
stants C = C(T) and a = a(T) > 0 such that the limit empirical measure ()4.1|1 satisfies 


VA e [0,T],Ve > 0, 


P 


sup > e <exp(—iV(a£^ 

^tQ<S,t<tQ-\-A -I 


CA)). 


To prove Proposition I4.1L we shall use a bit of classical stochastic calculus tools. 


4.1. SDE estimates. In this subsection we establish the following estimates of time reg¬ 
ularity for the stochastic process Yp. For all T > 0, there exist positive constants a and C 
such that, for all s,t,to,A G [0,T], 


(ii) 


fiii') E 


E|Ft-y;p <C\t-s\ 
E\Yt-Y,\^ <C\t- 


sup exp(a|yi — 

^to<s<t<to-\-A 


< l + CA. 


Proof. We start with (i). We use Ito’s formula to write a stochastic equation on the process 

{\Yt-Y,\‘^)t>s: 

|Ft-Wp = M,,i + 2d(t-s)-2 f (VF(W) + VW*/i,(W))-(W-W)du, 

J S 

where viewed as a process depending on t, is a martingale with zero expectation. 

Hence 

E |Fi - np = 2 d (f - s) - 2 ^ E (VF(W) + VW * /i„(W)) ■ (W - W) du. (4.2) 
On one hand 

e|(VF(W) + VW * /i.(W)) ■ (W - W)I' < 4 (e|VF(W)P + E|VW * /i„(W)|'' 

(e|W|2 + e|W|2). (4.3) 

On the other hand, by Proposition 13.IL /i„ has a hnite square exponential moment, uni¬ 
formly bounded for u G [0,T]. More precisely, there exist a > 0 and M < +oo such 

that / dpu{x) < M for all u < T. Since by assumption |VhF(z)| < L\z\ and 
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|Vy(a:)| = we deduce 

sup (e {VV{Y^) + Viy * ■ (F, - Ys)) < +CX). 

In view of (Q, it follows that there exists a constant C = C{T) such that 

E|Ft-F,P < {2d + C) (t-s). 

This concludes the proof of (i). 

To establish (ii), we perform a very similar computation. For given s, let := {\Yt 
Another application of Ito’s formula yields 


EZ,,i = 4(2 + d) / E\Y^-Y,\^du 


4 / E|K - - n) ■ (VF(K) + VhF * du. 


On one hand, from (i), 


/ E\Yu-Ys\^du<2C / {u- s)ds = C{t- sf 
J s J s 

On the other hand 

[ E IK - Kp(K - K) ■ (VF(K) + VIF * fiu{Yu)) du 


pt \ 3/4 p pi 

< ( / EZs^uduj 


1/4 


E |VF(K) + VIF * /r,(K)/dnj (4.4) 


by Holder’s inequality. But again, since the measures fit admit a bounded square ex¬ 
ponential moment, E|VF(K) -|- VfF * is bounded on [0,T]. We conclude that 


E Zs,t < C + {t- s)Y^ E dv}j 

Then, with C standing again for various constants which are independent of s and t, 

EZ,,„ < ^(EIKI^ + EIK/) < 2 0 sup [ \x\Ufiu{x) < C; 

0<t<T J 

SO, from (USD, 

E Zs,t < C{{t - sf + {t- s)Y\t - s)Y^) < C{t - s), 
and by (113 again we successively obtain 

EZs^t<C{t-s)Y\ 

and hnally 

EZ.,,<C(t-sf. 


(4.5) 
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This concludes the proof of (ii). 

We finally turn to the proof of (hi). Without real loss of generality, we set to = 0. We 
shall proceed as in the proof of Proposition Id.lL and prove the existence of some constant 
C and some continuous positive function a on M_|_ such that 

e( sup exp (a(t)|y) — Wp) I < 1 + (P A. (4.6) 

\0<s<t<A<T J 

Let a{t) be a smooth function, and 
By Ito’s formula. 


Zs,t — 1 + 
rt ^ 


+ 


2a{u) d + 2a|W - Wl' - (Vf" + VW * /ij(y;) ■ {Yu - W) + a{u)\Yu - Yg 


Ygo, dxi 


where 


Mg^t ■= / a{u) {Yu - Yg) Zu dBu- 


For each s, viewed as a stochastic process in t, is a martingale. 

By Young’s inequality, for any 6 > 0, 

-2(VY + VW * ^Iu){Yu) ■ {Yu-Yg)<b\Yu-Yg\^ + ^\VV + \/W*^Iu\\Yu). 


So, by letting 

and 

we obtain 


Au := a{u) 


2d + l\\/V{Yu) + \/W*^iu{Yu)\^ 


B{u) := a'{u) + 4a^(-u) + ha{u) 


Zs,t A 1 + Mg^t + / [Au + B{u)\Yu — Wfl Zg^udu. 


We choose a in such a way that the function B is identically zero, that is 


— bu 


a{u) = e 

where a(0) is to be fixed later. Then 


a(0) 


+ 4 


1 — e 


Zg^t A 1 + Mg^t + / Au Zg^u du 


from which it is clear that 


E sup < 1 + E sup Mg^t + / ^AuZg^udu. 

s<t<A s<t<A Js 


(4.7) 
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By Cauchy-Schwarz and Doob’s inequalities, 


E sup ) < E 

s<t<A 


sup Mg^t 

',<t<A 


<2 sup KlMg^tl"^. 

s<t<A 


(4.8) 


Also, by Ito’s formula and the Cauchy-Schwarz inequality again. 


E|M,,i|^= / a{uyE\Y^-Yg\^Zl,du 


< 


1 /■* 


iuf (EIK-F/)'/' du. (4.9) 


(4.10) 


In view of (ii), there exists a constant C such that 

E\Yu-Yg\‘^ <C{u-sy. 

Furthermore, 

EZj„ = Eexp4a(M)|K-m2 < (Eexp 16 a(M)|Kp)^^^ (Eexp 16a(M)|m2)^^^ (4.11) 

Recall from Proposition Id.II that there exist constants M and a > 0 such that 

sup f dfiu{y) < M. 
s<u<A J 

If we choose a(0) < a/16, the decreasing property of a will ensure that a{u) < a/16 for 
all u G [0, A], and 

Eexpl6a(M)|K|2 j d/i«(i/)^ < M. 


Then, from (HTTl) . 


sup EZj, <M. 

s<u<A 


Now, from (031) and (ITOD we deduce 

sup E < C {t — sy. 

s<t<A 

Combining this with (031), we conclude that 

E sup Mg^t < C A. 

s<t<A 

In the same way, we can prove that E (AtZg^ is bounded for t G [s. A] by bounding 
EZg^ and EA^. This concludes the proof of (14.6|) . and therefore of (hi) above. □ 
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4.2. Time-regularity of the limit empirical measure. We are now ready to prove 
Proposition 14.11 
On one hand 


■N ^N\ 






N 


Q 


2=1 


so 






r 1 ^ 1 

p 

sup PPl(Pf,Pf) 

0<s<t<A 

> e 

< P 

N ^ 

2=1 


(4.12) 


where 


1 /0= sup IW-Ki. 

0<s<t<A 

By Chebyshev’s exponential inequality and the independence of the {Y^ — VJ), 


P 


N 


-V w 

N ^ 


> e 


2=1 


< exp ( -N sup OC - logEexp(Cy^)l ) . 
V C>o / 


But, for any given ^ and a; > 0, 


Eexp(Cl/^) < Eexp 



a;2 + (W)2 

2a; 



^Eexp-/(V)". 

Z ZUJ 


Let u! 


—, so that — 
2 a 2uj 


a. Then, from estimate (iii) in Subsection mu 


Eexp ^(1/02 < 1 + OA, 
2uj 

uniformly in s and A. Hence, for any C > 0, 


Eexp(Cl/^) < Eexp 


4a 


(l + OA). 


Consequently, 


P 



> e 


< exp (—N sup OC “ — log(l + C A)1 

V C>o 4 

= exp(^-A[a£2-log(l + C'A)]) 

< exp(-A[aO-C'A]) . 


The proof of Proposition 14.11 follows by (14.1211 . 
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5. COUPLINC 

We now (as is classical) reduce the proof of convergence for to a proof of convergence 
for the empirical measure constructed on the auxiliary independent system (Y^). The 
final goal of this section is the following estimate. 

Proposition 5.1. With the conventions of Sv^hseetion AS. A 

Jo 

where V is defined by (EU, and a -.= f3 + 2 min( 7 , 0). 

Proof. For the sake of simplicity we give a slightly sketchy proof. We couple the stochastic 
systems {XI) and {Yf) by assuming that (i) Xq = Yf and (ii) both systems are driven 
by the same Brownian processes Bl. In particular, for each i G {!,... ,N}, the process 
XI — Yf satishes the equation 

d(x; - Yl) = -(vvixi) - vviY!)) dt - (vw *) - vw * ^,(y;)) dt. (s.i) 

From (EH) we deduce 

~\xi - Y,f = - (VV(X;] - xv(Yr)) ■ (x; - y;) 

- (xw * p^(xi) - vw » ^Lt(,Y;)) ■ (x; - y;). (6.2) 

Our convexity assumption on V implies 

-(VV(X‘) - VV(Y,‘)) ■ (x; - Y‘) < -/djx; - K,f; 

so the main issue consists in the treatment of the quantity XW * fifi{XI) — XW * fifiYf) 
appearing in the right-hand side of (EH). There are (at least) two options here. The hrst 
one consists in writing 

vw * i 2 f{xi) - vw * /Xi(y/) = 

{XW * /If - VW * iat){Xi) + {XW * Ht{Xi) - VW * /ifT/)); (5.3) 
while the second one consists in forcing the introduction of vfi as follows: 

VW * /If (Xf - VW * yit{Yf) = 

1 ^ 

- Y,lxw(x; - XI) - VW{Yl - Yi)] - (VW * 7 - VVF * ftXy). (5.4) 

i=i 

Both options are interesting and lead to slightly different computations. Since both lines 
of computations might be useful in other contexts, we shall sketch them one after the 
other. The second option leads to better bounds, but at the price of more complications 
(in particular, we shall need to sum over the index i at an early stage). 
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First option: We start as in (I5.HI1 . In view of our assumption on , the Lipschitz 
norm of VIF {XI — •) is bounded by F. Therefore, by the Kantorovich-Rubinstein dual 
formulation (II2D, 




XW{Xl-y)d{Jl 


■N 


^^t){y) 




and then our assumptions on V and W imply 

~|.V - y/T^ < -(7 + /3) ix; - y/T^ + rIV - y}\. 

In other words, \Xl — Yl\ satisfies the differential inequality 
^\xi - y/l + (;3 + 7) |A1 - Y}\ < r 

{XI and separately are not Lipschitz functions of t, but their difference is). Hence, by 
Gronwall’s lemma, 

\Xi - Kii < r [ hFi(pf, /i,) ds. 

Jo 

Now we sum over i; by convexity of the distance Wi and triangular inequality, we obtain 


< 4Eiv;-y,i<r / ds 

i=i 

< T f‘ [lyi(ffvf) + ^yl(pf,^l,)] ds. 

Jo 

By using Gronwall’s lemma again, we deduce 

lKi(pf,Ff)<r [ 


Jo 

By applying the triangular inequality for IFi, we conclude to the validity of Proposi¬ 
tion (ED), only with a replaced by the (a priori smaller) quantity /d -f- 7 — P. 

Second option: Now we start with (I5.4I1 . This time we sum over i right from the 
beginning: 


1 <7 _ _ 

2 A 1^1 - ^<‘1' = - ■ (xi - yi) 

2 = 1 2=1 

where 

Af = (XW(Xi - xi) - VW(Yi - Yi)) ■ (x; 

and 

= i\V{Y; - Yi) - XW * is,(Yi)) ■ (x; - 


1 " 

*,j=l 

-yj) 

yj)- 
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Since VW is an odd fnnction and D^W{x) > 7/ for all x G we have 

j¥‘ + Af = (vw(x; - XI) - vwiv; - y/)) ■ {(x; - x’) - (y; - y’)) 

>i\(xi-xi)-{Y;-Yr)\\ 

whence 

N N N 

- /If < -| E l(Y' - A') - (F - Kf)p < -2Nr E lA - Yl 

ij'=l i,j=l 


ri\2 


2 = 1 


where 7 = min( 7 , 0 ). 

Then 

N 

- E A” = -(A - A) • (Vir * 7(7) - viy * ^(7))- 

i=i 

Onr assnmption on implies that the Lipschitz norm of Vhh(E/ — •) is bonnded by 

r; so, by the Kantorovich-Rnbinstein dnal formulation da, 


vw * 


VW{Yl-y)d{u^-fj.,){y) 




Collecting all terms we hnally obtain 
1 d ^ ^ 


N 


2 Jt EI A' - 7f < -(13 + 27 -) EIA - 7f + rE lA - 7IW"i(7. a.)- 


i=l 


i=l 


i=l 


N 


N 


1/2 


N 


Then, since ^ < ( iV ^ |X; - E/p j , the function y{t) := ( ^ 


2 = 1 


2=1 


2=1 


satisfies the differential inequality 

y'{t) + {p + 2j-)y{t) < T lEi(Yf, yt), 

so that 


N 


1/2 


-E lA - 7f) < r / 1 ^ 1 ( 7 ,^,)*. 


2=1 


The conclusion follows by triangular inequality again since 


N 


1/2 


M'i( 7.7) < ^2(7.7) < ( ^E lA - 7r" 


2 = 1 


□ 


Remark 5.2. Not only does the “second option” in the proof lead to better bounds, it 
also provides an estimate of the distance between y and u in the IE 2 distance, which is 
stronger than the lEi distance. However, we do not take any advantage of this refinement. 
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6. Conclusion 


In this section, we paste together all the estimates established in the previous sections, 
so as to prove Theorems 11.71 to 11.111 

6.1. Concentration estimates. We start with the proof of Theorem 11.71 By C we shall 
denote various constants depending on T, on our assumptions on V and W, and also on 
J dno{x), for some a > 0. 

From Proposition EH 

sup hFi(/if ,/it) < (Te'"'^ + 1) sup ,/it) ds. 

0<t<T 0<t<T 

In particular, there is a constant C such that 
P 

From Corollary Id.21 and Theorem ll.il we know that 

-KNt^ 


sup Wi(ji^,fit) > e 

< P 

sup IFi(Pf ,/iJ > £ 

0<t<T 


0<t<T 


_ £ 


( 6 . 1 ) 


sup < e 

0<t<T 

for all t e [0,T], iV > iVo max(£“*^'^'+^\ 1) {d' > d). The issue now is to “exchange” sup 
and P in this estimate. As we shall see, this is authorized by the continuity estimates on 
Df and Ilf 

Let A > 0 (to be hxed later on), and let M be the integer part of T/A + 1. We 
decompose the interval [0,T] as 

M-l 

[0, T] = [0, A] U [A, 2A] U ... U [(M - 1)A, T] C (J [hA, {h + 1)A]. 

h=0 

Proposition Id.dl guarantees that, if A < ae^ for some a small enough, then 

hA<t <{h+ 1)A ^ Wiifit, /ifcA) < |- 
Then, by triangular inequality and (in2i), 

p 


( 6 . 2 ) 


< P 


sup IFi(z7i ,/ii) > e: 

0<t<T 


sup sup Wi{uf, fit) > e 

/i=0,...,M-l hA<t<{h+l)A 


< p 


sup sup i^^) + sup 

h=0,...,M-l hA<t<{h+l)A h=0,...,M-l 


flhA) 


+ sup sup IFl(/ihA,/it) > c) 

h=0,...,M-l hA<t<{h+l)A 
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< P 

which can be bounded by 
P 


sup sup sup fihA) > 

hA<t<{h+l)A h=0,...,M-l ^ 


sup sup fCi(i 2 f,P^) > 7 

+ P 

£• 

sup fhl(Y^,/4/iA) > 7 

/i=0,...,M-l hA<t<(h+l)A 4 


1 —1 

1 

o’ 

II 

.£S 


By Corollary Id.21 and Theoreui ll.il there exist some constants C and Nq such that 


P 




< exp(-CNg‘) 


for all h = 0,..., M — 1, and N > Nq max(£ (<^'+ 2 )^ Hence 


P 


sup ^iha) > ^ 


< 


M-1 

E*" 

h=0 


Wi{u^A^fihA) > 7 


< Mexp(-C'A^?"). (6.3) 


On the other hand, from Proposition 14.11 we deduce 


P 


sup fCl(Yf,P'^A) > 7 


hA<t<{h+l)A 

for all h = 0,..., M — 1 and £ > 0, so 


P 


sup 


h=0,...,M-l hA<t<{h+l)A 
a 

side of (|6.4|1 by 

a 


sup bPl (0 > 7 


< exp -CA)) 


<Mexp (^-N{je^ -CA) 


(6.4) 


We can assume that A < and M < CT/'P + 1; then we can bound the right-hand 


T 




Mexp < C (^1 + -J exp(^--A£ 

From O and (j6.5|) we deduce that, for A small enough (depending on e!) 


P 


sup PFi(Pf ,/rt) > £ 

0<t<T 


<2C{1 + ^] exp{-KNe^) 


(6.5) 


( 6 . 6 ) 


for N > Aomax(e: C'+ 2 )^ deduce from ()6.6|) that 


P 


sup Wi{u^,nt) > e 

0<t<T 


< exp ( log ( C ( + 1 


T 


KNe^ 


where again C,K stand for various positive constants, and N > max(Aoe: C'+ 2 )^ This 
concludes the proof of Theorem 11.71 
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6.2. Uniform in time estimates. Now, we shall focus on the case when /9 > 0, (3+2 7 > 0 
is positive, and derive Theorem 11.91 bv a slightly refined estimate. 

Let us start again from the bound 

Jo 

where a := (3 + 2 min( 7 , 0 ) is positive. Let A > 0 (to be fixed later on), and k be the 
integer part of t/A. If Wi(Jlf, fit) is larger than e, then 

■N 


either Wi{D^ ,/it) > - 


or 3j e { 0 , ...,/c}; 


~(i+l)A 


IjA 




2k+2-jY' 


Indeed, {s/2) + ^) ^ ^ consequence. 


either Wi{D^,fit) > 


or 3j e {0,..., A;}; sup hLi(Pf,/is)> 


jA<s<{j+l)A 


2k+2-jY 


Since, for t E [jA, (j + 1)A], 

^a[t-{j+l)A] 


^a{k-j-l)A 


> 


4g«A 


2k+2-j — 2^“i+2 

we conclude to the existence of a constant C such that 
P > e] < P [W'KPf.ft) > I 


ckA \ ^ 3 


k 

Ep 

j=0 


sup Wii/V^, fis) > Ce 

jA<s<{j+l)A 


aA \ k j 


We already know that the first term in the right-hand side in (EZD is bounded by 


(67) 


,-AV£2 


for some constant A > 0, and so we focus on the other terms. 

In the proof of Theorem 11.71 we have established that there are constant C and A, 
depending on A and on bounds on square exponential moments for po, such that 


sup Wi{D/^,/is) > 6 

0<s<A 


^ C* ( 1 + ^2 


-AV52 


( 6 . 8 ) 


Proposition Id.II guarantees that these square exponential bounds also hold true for /it, 
uniformly in t. Thus we can apply (iniHi) with /ijA taken as initial datum, and get 


P 


sup Wi{ug,iis)>6 

jA<s<{j+l)A 


< Ce 


-\NS^ 


(6.9) 
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as soon as N > Nq max((5 1). 

We now use (|6.9p to bound the sum appearing in the right-hand side of dnizi)- Choose 
A large enough that 


Applying (j(i.9|l with 6 replaced by C9^ we can bound the sum in the right-hand side 
of ()f).7|l by 

k 

j=0 

for N > Aq max(e:“C'+2)^ where C, K and Nq are again positive constants. Since again 
9 is larger than 1, there is a constant a > 0 such that > a{k — j), so the sum above 

is bounded by 

/ °° \ f „-KNe^ \ 

c + E) S C ■ 

If Nq is large enough, our assumption N > Nq ma'x.{e~^‘^'~^‘^\ 1) implies that is always 

less than 1/2, so that the above sum can be bounded by just . This concludes the 

proof of the hrst point of Theorem 11.91 

The second point is proved by writing 

fhl(hf,hoo) < 

< Wi(pf,/it)+C'e-^* 

successively by the triangular inequality for Wasserstein distance and use of Proposition 
Id.81 Then the result follows from the uniform estimate obtained above. 

6.3. Data reconstruction. We hnally consider Theorem 11.111 Proposition 13.61 ensures 
that, as f —> cx), /t is uniformly bounded in where k is arbitrarily large. Since ft 
converges to /oo as f —>• cx), we deduce that f^o is Lipschitz. Then Theorem 11.91 and 
Proposition 12.11 together imply Theorem 11.111 

Appendix A. Metric entropy op a probability space 

We now prove the covering result used in Section 12.11 as a particular case of a more 
general estimate. Let E he a Polish space, we look for an upper bound on the number 
Afp{E,6) := m(V{E),6) of balls of radius 6 in Wasserstein distance Wp needed to cover 
the space V{E) of probability measures on E. We use the same strategy as in |12l Exer¬ 
cise 6.2.19], where the Levy distance is used instead of the Wasserstein distance. 

Theorem A.l. Let {E, d) he a Polish space with finite diameter D. For any r > 0, define 
N{E, r) as the minimal number of balls needed to cover E by balls of radius r. Then there 








43 


exists a numerical constant C such that for all p > 1 and 6 G (0, D), the space V{E) can 
be covered by J\fp{E, S) balls of radius 5 in Wp distance, with 


Mp{E,5) < 


( 7 £)\ 


(A.l) 


Remark A.2. The Wp distance between any two probability measures on E is at most 
D, so, for all S > D, we have the trivial estimate Mp{E, S) = 1. 

Proof. Let r > 0, and let {xj}i<j<N{E,r) be such that E is covered by the balls B{xj,r) 
with centers Xj G E and radius r. For simplicity we shall write N = N{E,r). 

In a first step we prove that for any p, G V{E) there exist nonnegative real numbers 

N 

with E (3j = 1, such that 
i=i 


N 

Wp{p, p) <r, 

i=i 

For this we hrst replace the balls B{xj,rys by the sets B^s dehned by 

Vj, Bj = B{xj,r)\ IJ B{xk,r), 

k<j-l 


SO that E is partitioned into the -B/s. Next dehne 

f3j = p[Bj\. 

It is easy to check that the required properties are fulhlled. Indeed, we may transport p 

N 

onto p = (djdxj by sending all x’s in Bj onto Xj, for each j = 1,... ,N\ the cost of this 

transport is bounded by = r^. 

In the second step we introduce an integer K (whose value will be made more precise 
later on), and consider the set 


N 


1 ^ ? C V{E), 


. 4=1 


where Ak is the set of all A-tuples (aj)i<j<Ar, such that each aj is of the form kj/K, 

N 

kj G N, and = 1. 

4=1 
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N 


1/p 


Given a probability measure p = (where (/3j)j does not necessarily belong to 

Ak), there exists p' in Ck such that 

< D 

To prove (EH, we dehne rij as the integer part [K(3j] of K(3j and J as the hrst integer 
such that 

./ N 

+ 1 ) + = K. 


(A.2) 


i=i 


j=J+i 


N 


Since = 1, it is clear that J < N. Then we dehne a measure /i' G Ck by /j! = 


i=i 


N 


where 

i=i 


aj — 


^ forj = J + 


Let us bound the distance between /i and /i'. For that we gradually dehne a transport 
plan between p and fi' in the following way: hrst of all, at each point Xi, the mass ni/K 
stays in place. Then, the remaining masses /3j — Uj/iF are redistributed as follows: all the 
remaining mass at a:i,..., is brought to Xi, together with possibly a bit of mass at a^^+i, 
until a total mass 1/K has been added at location xi (for ^ large enough). If J > 2, then 
we again bring mass from X£+i,..., until another mass 1/K has been added at X 2 - We 
carry on until all the mass at xj has been used, thus building a transport plan (vrjj)i<jj<Ar 


Ui 


which sends /i onto /i', in such a way that na > — for all i. Hence, 

K 


^ R-i 1 
Tin = Pi -< -, 

“ K - K' 


< A 

and this plan yields an upper bound on the Wasserstein distance: 

N N 


hFp(/4,h') < d{xi,XjYTiij = ^ ^d{xi, 
i,j=l i=l j^i 


DP 


Xi 




To summarize the hrst two steps: for any fi in V{E) there exists p' G Ck such that 

/ N\ 

< r + D . 

In other words, the family ( Bi/x', r + DiN / KY/p\\ covers V(E). 

V / p'&Ck 
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In the third step we choose some suitable K and r for a given 6. 

We hrst choose K in such a way that r and D{N /have the same order of magnitude, 
for instance 


K = 



+ 1 . 


Then 


r + D{N/KY^P < 2r, 


and the balls + D(N/KY^^) have radius at most 6 if 



Now K and r are hxed, N = N{E,6/2), and we just have to estimate the cardinality 
^Ck of Ck- For this we hrst note that 

_{K + N-1)\ _{K + N-1)...K 
^ ^ ~ {K-1)\N\ ~ N\ 

Without loss of generality, we have assumed S < D, so K > N. Then K < ■ ■ ■ < 
K + N — 1 < 2K, and hence 


H A _ _ 


f2Ke\ 

W) 




Since iV > 1 and 2D > 6, we can write 

'2D^^ 


K <N 


6 


+ 1<2N 


2Dy 


and we deduce 


with C = 2(4e)Fp < 8e. 


^Ck < 



pv(i?,|) 


Consequently, we have covered V{E) by the 
radius S. This concludes the argument. 


f Cyj balls ■with 


□ 


In the particular case when E is the Euclidean ball Bn of radius R in we have 


N{BR,r) < k 



(A.3) 
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for some constant k. To see this, one may for instance consider the balls with center in the 
lattice in Then Theorem lA.II yields the bonnd 

K(BR,S)<[Cjj 

which is nsed in the present paper. 


Appendix B. Regularity estimates on the limit PDE 

In this appendix we stndy solntions to the limit eqnation 

dtp = ^p + V.{p{y+ W * p)), t>0, (B.l) 

and establish the regnlarity results stated in Proposition EIHl Following the method in |ldj . 
we shall measure the regularity in terms of L^-Sobolev spaces 

= |m e g a e |a| < s| (s G N). 

Our main result is as follows. 


Theorem B.l. Let V and W such that all their partial derivatives d°‘V and d°‘W are 
continuous and grow at most polynomially at infinity, for any multi-index a E N‘^ with 
I a I < s + 1. Let a,E > 0 and let po be a probability density such that 



dpo{x) < E. 


Then, there exists a continuous function f : (0,+oo) —> (0,+oo), only depending on d, s, 
V, W, a and E, such that any classical solution p = p{t,x) to (IB.lfl . starting from po, 
satisfies 

||p(^! ■)|lHqRd) — /(^)- 

Proof. For the sake of simplicity we only give a formal proof, which can be turned rigorous 
by means of regularization arguments. 

Let then p = (p(t, .))t>o be a solution of 

dtp = Ap + V. {p{V + IT * p)), t > 0, X E R'^; 

we rewrite the equation as 

d 

dtp = ^ diip + di [pdifi ], 
i=l 

where dt = 5®* if e* is the i-th vector of the canonical base of R'^, and 

(j){t, x) = V{x) + W * pit, x). 
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Let a G be given. By integration by parts and Canchy-Schwarz ineqnality, 

“ / f svs,(s»= f a“pa"(s.p) 

z ai j-^d 

d n 

= y] / (5nP + di [pdi(j)]) 

i=l 

f [p5i0] 

d n d - n ~ 1/2 ^ 

<_v/ ia“+^vr + y / y^p/ 

i=i i J 


1 1/2 


< 


Ie/ / |5“-'’+'‘.#.s'’p|7 

^ i=i -/R" ■/R'* 


By snmming over a G with |a| < s, we find 

d 


TtY. f i9>t<-EE/ + EEC"-/’/ |s““'’+'‘.^s'’pr- 

I 1^ Pr'^ I 1^ ._i Pr'^ I 1^ Jr‘^ 

Given T > 0, by Proposition Id.11 there exist constants d and E, depending only on d, a, 
E and T, snch that 

J e^^^^"dp{t,x) <E (B.2) 

for all t G [0,T]. In particnlar, it follows from onr assnmptions on the derivatives of V 
and W that all terms are bounded by some polynomial in |x|, uniformly in 


t G [0,T], 

Let {x) := a/1 + \x\^. For fc, s > 0, we introduce the weighted norms 

\ 1/2 

y J (x)’^ \d°‘u{x)\‘^ dx 




|q:|<S ' 


and 


•= / {x)^ \u{x)\dx. 

'' jR'i 

Then for any s G N and T > 0 there exist k and G > 0 such that 

0 < t < T - 7 -1111 i 177” < ~l|ll|l77'’+i + C WuWl 

dt 

We shall prove later on the following interpolation lemma: 


(B.3) 
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Lemma B.2. Given d > 1, s E an k > Q, there exist nonnegative constants C{d,s, k) 
and h{d, s, k), and 6{d, s) G (0,1) such that for all u G fl 


u 


Hi 


< C{d,s, k)\\u\\ 


i-e(d,s) 

T 1 

h{d,s,k) 




Then, again from (lEl, all norms are bounded on [0,T], so from (jR.dj) and 

Lemma fB.2l there exists some constants C such that 


< -hllW + C- ||m||^".+i < + C < -C\\u\\fs + C. 


In other words A{t) = ||m||^s(^) satishes on [0,T] the differential inequality 

A{t)+cA{ty < C 


(B.4) 


for some constants c,C>0 and p = 1/9 > 1 depending only on d, a, E, s and T. 

Let us distinguish two cases. If ^(0) < 1, then we only use the inequality A'(t) < C to 
make sure that 


A{t) < kl(0) + Ct<l + CT 

for any t G [0, T], 

If on the other hand ^4(0) > 1, we deduce from (jB.4jl that 

A'{t)+cA{ty < CA{t), 

as long as A{t) > 1, so that D{t) := A{ty~P satishes the inequality 

D\t) + {p-l)CD{t) > (p-l)c 


which integrates to 

D{t) > D{o)e^^-p^^^ + ^(1 - > ^(1 - 

(_y O 

As a consequence, as long as A{t) > 1, we have 

Ait) < (c/C)^/^-P(l 


In the end, we have obtained an a priori bound on A(f) = J \d^pf (t) for t G (0,T], 

depending only on d, s, a, E and T, but not on the initial value A(0). Then the proof can 
be concluded by an approximation argument. □ 


Proof of Lemma W.A We proceed by induction on s. 

In the first step we prove the result for s = 0. Given d > 1 and a G (0,1], we write 



so, by Holder’s inequality. 


uix)\'^ dx = 




dx, 
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(with- - = cx) if a = 1). Then by Sobolev embedding, 


1 — a 


\u\\^2 < C{d,a)\\u\\li^ 


where a = 1 if d = 1, a is arbitrary in (0,1) if d = 2, and a = - -if d > 3, that is, 


d + 2 


\u\\l2 < C'(d)||M||J^/^'^^ ||m||2? 

k 


where d(l) = -, any 9{2) G ( 1 ) for d = 2, and 9{d) = for d > 3. 

In the second step we let 5 > 1 and assume by induction that there exist some constants 
C(d, s—1, k), h{d, s —1, A;) > 0 and 9{d, s —1) G (0,1) such that for all u G 

< C{d, s — 1, 

^ '‘^h(d,s — l,k) 

Let then u G L^(R‘^) fl 

Given a G with |q;| = j and 1 < j < s, we split a into a = ai + a 2 with \a 2 \ = 1, 
and integrate by parts: 

l|S"«lli; < k l|a»i,||« WifuU^ + WcT'uW^ ||a“+»i,|U, 


< (A^+l)|| 9 “‘!i||i| sup || 9 “'u||ii, 


|a|<4+l 


whence 


sup ||(9“ u\\L 2 <{k + l) sup \\d^u\\L 2 sup \\d^u\\L 2 


|a|=i 


\a\=j-l 


|a|<4+l 


< (A; + 1) sup ||(9"M||i2 sup ||(9“M||i2. 


|q:|<S—1 


|q;|<S+1 


Since this holds for any 1 < j < s we obtain 


sup \\d^u\\l 2 <{k + l) sup \\d^u\\L 2 sup ||<9 “m||l2. 


l<|o|<s 


|q:|<S —1 


|q:|<5+1 


Moreover 


so that dually 


< \\u\\l%. \\u\\l^ < sup ||'9 “m||l 2 sup ||<9 “m||l2. 


|q:|<5—1 


|q:|<5+1 


\u\\‘hs < (A; + 1)||m||||m||j:^s+i. 


Then, by induction hypothesis. 


— (A: + 1) G(d, s — 1, 2 A;)||m||!:i 

^ k, ^ u 


m 


h{d,s — \,2k) 




\u\\H‘+k, 


whence 


\u\\h^ < C{d,k,s)\\u\\^i 


l—6{d,s) II i\6{d,s) 


U 


h{d,3,k) 
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the argument. 

□ 
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